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1 1 . An apparatus comprising: 

2 at least one processor; 

3 a memory coupled to the at least one processor; 

4 a first instruction stream residing in the memory; and 

5 a profile-based loop optimizer residing in the memory and executed by the at least 

6 one processor, the loop optimizer inserting instrumentation code into the first instruction 

7 stream that collects profile data in at least one execution frequency table and thereby 

8 generating a second instruction stream, each execution frequency table indicating values 

9 representative of the number of times a corresponding loop is executed each time the loop 
10 is entered. 

1 2. The apparatus of claim 1 wherein each execution frequency table includes a 

2 plurality of entries, each entry containing: 

3 (1) a value representative of the number of times a loop is executed each time the 

4 loop is entered; and 

5 (2) a count of the occurrences of each value when the second instruction stream is 

6 executed. 

1 3. The apparatus of claim 1 wherein the profile-based loop optimizer uses values in 

2 the at least one execution frequency table to peel at least one loop in the first instruction 

3 stream. 

1 4. The apparatus of claim 1 wherein the profile-based loop optimizer uses values in 

2 the at least one execution frequency table to unroll at least one loop in the first instruction 

3 stream. 
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1 5. The apparatus of claim 1 wherein the profile-based loop optimizer uses values in 

2 the at least one execution frequency table to peel and unroll at least one loop in the first 

3 instruction stream. 

1 6. The apparatus of claim 1 wherein the instrumentation code comprises: 

2 code to allocate a loop iteration counter for a selected loop; 

3 code to allocate the execution frequency table for the selected loop; 

4 code to clear the loop iteration counter on all entry paths to the selected loop; 

5 code to increment the loop iteration counter in a header block for the selected 

6 loop; and 

7 code to read the loop iteration counter and update the execution frequency table 

8 along all exit paths from the selected loop. 
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1 7. An apparatus comprising: 

2 at least one processor; 

3 a memory coupled to the at least one processor; 

4 a first instruction stream residing in the memory; and 

5 a profile-based loop optimizer residing in the memory and executed by the at least 

6 one processor, the loop optimizer optimizing at least one loop in the first instruction 

7 stream according to profile data stored in at least one execution frequency table, each 

8 execution frequency table indicating values representative of the number of times a 

9 corresponding loop is executed each time the loop is entered. 

1 8. The apparatus of claim 7 wherein each execution frequency table includes a 

2 plurality of entries, each entry containing: 

3 (1) a value representative of the number of times a loop is executed each time the 

4 loop is entered; and 

5 (2) a count of the occurrences of each value when the first instruction stream is 

6 executed. 

1 9. The apparatus of claim 7 wherein the profile-based loop optimizer uses values in 

2 the at least one execution frequency table to peel at least one loop in the first instruction 

3 stream. 
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1 10. The apparatus of claim 9 wherein the profile-based loop optimizer peels a selected 

2 loop if one of the following conditions are true: 

3 (A) the execution frequency table for the selected loop has a dominant mode that 

4 is less than a specified peeling threshold; 

5 (B) the execution frequency table for the selected loop does not have a dominant 

6 mode, and most of the execution frequencies in the execution frequency table are smaller 

7 than the specified peeling threshold. 

1 11. The apparatus of claim 10 wherein the profile-based loop optimizer unrolls the 

2 selected loop if: 

3 neither (A) nor (B) are true; and 

4 most of the execution frequencies in the execution frequency table for the selected 

5 loop are greater than a specified unrolling threshold. 

1 12. The apparatus of claim 7 wherein the profile-based loop optimizer uses values in 

2 the at least one execution frequency table to unroll at least one loop in the first instruction 

3 stream. 

1 13. The apparatus of claim 7 wherein the profile-based loop optimizer uses values in 

2 the at least one execution frequency table to peel and unroll at least one loop in the first 

3 instruction stream. 

1 14. The apparatus of claim 7 wherein the profile-based loop optimizer determines 

2 whether to peel or unroll a loop based on a dominant mode, if present, in the execution 

3 frequency table corresponding to the loop. 
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1 15. An apparatus comprising: 

2 (A) at least one processor; 

3 (B) a memory coupled to the at least one processor; 

4 (C) a first instruction stream residing in the memory; and 

5 (D) a profile-based loop optimizer residing in the memory and executed by the at 

6 least one processor, the loop optimizer inserting instrumentation code into the first 

7 instruction stream that collects profile data in at least one execution frequency table and 

8 thereby generating a second instruction stream, wherein each execution frequency table 

9 includes a plurality of entries, each entry containing: 

10 (Dl) a value representative of the number of times a loop is executed each 

1 1 time the loop is entered; and 

12 (D2) a count of the occurrences of each value when the second instruction 

1 3 stream is executed; 

14 (E) wherein the instrumentation code comprises: 

15 (El) code to allocate a loop iteration counter for a selected loop; 

16 (E2) code to allocate the execution frequency table for the selected loop; 

17 (E3) code to clear the loop iteration counter on all entry paths to the 

1 8 selected loop; 

1 9 (E4) code to increment the loop iteration counter in a header block for the 

20 selected loop; and 

21 (E5) code to read the loop iteration counter and update the execution 

22 frequency table along all exit paths from the selected loop; 

23 (F) the loop optimizer optimizing a loop in the first instruction stream according 

24 to profile data stored in the at least one execution frequency table by peeling the loop, 

25 unrolling the loop, or both peeling and unrolling the loop based on profile data stored in 

26 the execution frequency table corresponding to the loop. 
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1 16. The apparatus of claim 1 5 wherein the profile-based loop optimizer peels a 

2 selected loop if one of the following conditions are true: 

3 (A) the execution frequency table for the selected loop has a dominant mode that 

4 is less than a specified peeling threshold; 

5 (B) the execution frequency table for the selected loop does not have a dominant 

6 mode, and most of the execution frequencies in the execution frequency table are smaller 

7 than the specified peeling threshold. 

1 17. The apparatus of claim 1 6 wherein the profile-based loop optimizer unrolls the 

2 selected loop if; 

3 neither (A) nor (B) are true; and 

4 most of the execution frequencies in the execution frequency table for the selected 

5 loop are greater than a specified unrolling threshold. 
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18. A method for instrumenting a first instruction stream comprising the steps of: 
inserting code into the first instruction stream for a selected loop that defines at 

least one execution frequency table for the selected loop, each execution frequency table 

indicating values representative of the number of times the selected loop is executed each 

time the selected loop is entered; 

inserting code into the first instruction stream that updates the execution 

frequency table according to the number of times the selected loop is executed each time 

the loop is entered. 
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1 19. A method for instrumenting a first instruction stream comprising the steps of: 

2 inserting code that allocates a loop iteration counter for a selected loop in the first 

3 instruction stream; 

4 inserting code that allocates an execution frequency table that corresponds to the 

5 selected loop; and 

6 inserting code to clear the loop iteration counter on all entry paths to the selected 

7 loop; 

8 inserting code to increment the loop iteration counter in a header block for the 

9 selected loop; and 

1 0 inserting code to read the loop iteration counter and update the execution 

1 1 frequency table along all exit paths from the selected loop according to the number of 

12 times the selected loop is executed each time the loop is entered. 
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1 20. A method for optimizing at least one loop in a first instruction stream, the method 

2 comprising the steps of: 

3 inserting instrumentation code into the first instruction stream that collects profile 

4 data in at least one execution frequency table and thereby generating a second instruction 

5 stream, each execution frequency table indicating values representative of the number of 

6 times a corresponding loop is executed each time the loop is entered; and 

7 optimizing at least one loop in the first instruction stream according to profile data 

8 stored in at least one execution frequency table. 

1 21 . The method of claim 20 wherein each execution frequency table includes a 

2 plurality of entries, each entry containing: 

3 (1) a value representative of the number of times a loop is executed each time the 

4 loop is entered; and 

5 (2) a count of the occurrences of each value when the first instruction stream is 

6 executed. 

1 22. The method of claim 20 further comprising the step of using values in the at least 

2 one execution frequency table to peel at least one loop in the first instruction stream. 

1 23 . The method of claim 20 wherein the step of using values in the at least one 

2 execution frequency table to peel at least one loop in the first instruction stream peels a 

3 selected loop if one of the following conditions are true: 

4 (A) the execution frequency table for the selected loop has a dominant mode that 

5 is less than a specified peeling threshold; 

6 (B) the execution frequency table for the selected loop does not have a dominant 

7 mode, and most of the execution frequencies in the execution frequency table are smaller 

8 than the specified peeling threshold. 
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1 24. The method of claim 23 further comprising the step of unrolling the selected loop 

2 if: 

3 neither (A) nor (B) are true; and 

4 most of the execution frequencies in the execution frequency table for the selected 

5 loop are greater than a specified unrolling threshold. 

1 25. The method of claim 20 further comprising the step of using values in the at least 

2 one execution frequency table to unroll at least one loop in the first instruction stream. 

1 26. The method of claim 20 further comprising the step of using values in the at least 

2 one execution frequency table to peel and unroll at least one loop in the first instruction 

3 stream. 

1 27. The method of claim 20 further comprising the step of determining whether to 

2 peel or unroll a loop based on a dominant mode, if present, in the execution frequency 

3 table corresponding to the loop. 
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1 28. A method for optimizing a plurality of loops in a first instruction stream, the 

2 method comprising the steps of: 

3 (A) inserting code that allocates a loop iteration counter for at least one loop in the 

4 first instruction stream; 

5 (B) inserting code that allocates an execution frequency table that corresponds to a 

6 loop in the first instruction stream, wherein each execution frequency table includes a 

7 plurality of entries, each entry containing: 

8 (1) a value representative of the number of times a loop is executed each 

9 time the loop is entered; and 

O 

S 10 (2) a count of the occurrences of each value; 

1 1 (C) inserting code to clear the loop iteration counter on all entry paths to the 

fU 12 selected loop; 

fy 1 3 (D) inserting code to increment the loop iteration counter in a header block for the 

^ 14 selected loop; and 

H ; 15 (E) inserting code to read the loop iteration counter and update the execution 

M: 1 6 frequency table along all exit paths from the selected loop according to the number of 

IS! 17 times the selected loop is executed each time the loop is entered; 
N 1 1 8 (F) the inserting code in steps (A) through (E) generating a second instruction 

19 stream; 

20 (G) executing the second instruction stream with sample inputs to collect profile 

2 1 data in the at least one execution frequency table; 

22 (H) using values in the at least one execution frequency table to peel at least one 

23 loop in the first instruction stream based on profile data stored in the execution frequency 

24 table corresponding to the loop; and 
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(claim 28 continued) 

25 (I) using values in the at least one execution frequency table to unroll at least one 

26 loop in the first instruction stream based profile data stored in the execution frequency 

27 table corresponding to the loop. 

1 29. The method of claim 28 wherein the step of using values in the at least one 

2 execution frequency table to peel at least one loop in the first instruction stream peels a 

3 selected loop if one of the following conditions are true: 

4 (A) the execution frequency table for the selected loop has a dominant mode that 

5 is less than a specified peeling threshold; 

6 (B) the execution frequency table for the selected loop does not have a dominant 

7 mode, and most of the execution frequencies in the execution frequency table are smaller 

8 than the specified peeling threshold. 

1 30. The method of claim 29 wherein the step of using values in the at least one 

2 execution frequency table to unroll at least one loop in the first instruction stream unrolls 

3 the selected loop if: 

4 neither (A) nor (B) are true; and 

5 most of the execution frequencies in the execution frequency table for the selected 

6 loop are greater than a specified unrolling threshold. 
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1 31. A program product comprising; 

2 (A) a profile-based loop optimizer that inserts instrumentation code into a first 

3 instruction stream that collects profile data in at least one execution frequency table and 

4 thereby generates a second instruction stream, each execution frequency table indicating 

5 values representative of the number of times a corresponding loop is executed each time 

6 the loop is entered; and 

7 (B) computer-readable signal bearing media bearing the profile-based loop 

8 optimizer. 

1 32. The program product of claim 3 1 wherein the computer-readable signal bearing 

2 media comprises recordable media. 

1 33. The program product of claim 3 1 wherein the computer-readable signal bearing 

2 media comprises transmission media. 

1 34. The program product of claim 3 1 wherein each execution frequency table includes 

2 a plurality of entries, each entry containing: 

3 (1) a value representative of the number of times a loop is executed each time the 

4 loop is entered; and 

5 (2) a count of the occurrences of each value when the second instruction stream is 

6 executed. 



1 35. The program product of claim 3 1 wherein the profile-based loop optimizer uses 

2 values in the at least one execution frequency table to peel at least one loop in the first 

3 instruction stream. 
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1 36. The program product of claim 3 1 wherein the profile-based loop optimizer uses 

2 values in the at least one execution frequency table to unroll at least one loop in the first 

3 instruction stream. 

1 37. The program product of claim 3 1 wherein the profile-based loop optimizer uses 

2 values in the at least one execution frequency table to peel and unroll at least one loop in 

3 the first instruction stream. 

1 38. The program product of claim 3 1 wherein the instrumentation code comprises: 

2 code to allocate a loop iteration counter for a selected loop; 

3 code to allocate the execution frequency table for the selected loop; 

4 code to clear the loop iteration counter on all entry paths to the selected loop; 

5 code to increment the loop iteration counter in a header block for the selected 

6 loop; and 

7 code to read the loop iteration counter and update the execution frequency table 

8 along all exit paths from the selected loop. 
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1 39. A program product comprising: 

2 (A) a profile-based loop optimizer that optimizes at least one loop in a first 

3 instruction stream according to profile data stored in at least one execution frequency 

4 table, each execution frequency table indicating values representative of the number of 

5 times a corresponding loop is executed each time the loop is entered; and 

6 (B) computer-readable signal bearing media bearing the profile-based loop 

7 optimizer. 

1 40. The program product of claim 39 wherein the computer-readable signal bearing 

2 media comprises recordable media. 

1 41. The program product of claim 39 wherein the computer-readable signal bearing 

2 media comprises transmission media. 

1 42, The program product of claim 39 wherein each execution frequency table includes 

2 a plurality of entries, each entry containing: 

3 (1) a value representative of the number of times a loop is executed each time the 

4 loop is entered; and 

5 (2) a count of the occurrences of each value when the first instruction stream is 

6 executed. 



1 43 . The program product of claim 3 9 wherein the profile-based loop optimizer uses 

2 values in the at least one execution frequency table to peel at least one loop in the first 

3 instruction stream. 
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1 44. The program product of claim 39 wherein the profile-based loop optimizer peels a 

2 selected loop if one of the following conditions are true: 

3 (A) the execution frequency table for the selected loop has a dominant mode that 

4 is less than a specified peeling threshold; 

5 (B) the execution frequency table for the selected loop does not have a dominant 

6 mode, and most of the execution frequencies in the execution frequency table are smaller 

7 than the specified peeling threshold. 

1 45 . The program product of claim 44 wherein the profile-based loop optimizer unrolls 

2 the selected loop if: 

3 neither (A) nor (B) are true; and 

4 most of the execution frequencies in the execution frequency table for the selected 

5 loop are greater than a specified unrolling threshold. 

1 46. The program product of claim 39 wherein the profile-based loop optimizer uses 

2 values in the at least one execution frequency table to unroll at least one loop in the first 

3 instruction stream. 

1 47. The program product of claim 39 wherein the profile-based loop optimizer uses 

2 values in the at least one execution frequency table to peel and unroll at least one loop in 

3 the first instruction stream. 

1 48. The program product of claim 39 wherein the profile-based loop optimizer 

2 determines whether to peel or unroll a loop based on a dominant mode, if present, in the 

3 execution frequency table corresponding to the loop. 
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1 49. A program product comprising: 

2 (A) a profile-based loop optimizer that inserts instrumentation code into a first 

3 instruction stream that collects profile data in at least one execution frequency table and 

4 thereby generating a second instruction stream, wherein each execution frequency table 

5 includes a plurality of entries, each entry containing: 

6 a value representative of the number of times a loop is executed each time 

7 the loop is entered; and 

8 a count of the occurrences of each value when the second instruction 

9 stream is executed; 

1 0 wherein the instrumentation code comprises: 

1 1 code to allocate a loop iteration counter for a selected loop; 

12 code to allocate the execution frequency table for the selected loop; 

1 3 code to clear the loop iteration counter on all entry paths to the selected 

14 loop; 

1 5 code to increment the loop iteration counter in a header block for the 

16 selected loop; and 

17 code to read the loop iteration counter and update the execution frequency 

1 8 table along all exit paths from the selected loop; 

19 the loop optimizer optimizing a loop in the first instruction stream according to 

20 profile data stored in the at least one execution frequency table by peeling the loop, 

21 unrolling the loop, or both peeling and unrolling the loop based on profile data stored in 

22 the execution frequency table corresponding to the loop; and 

23 (B) computer-readable signal bearing media bearing the profile-based loop 

24 optimizer. 

1 50. The program product of claim 49 wherein the computer-readable signal bearing 

2 media comprises recordable media. 



Docket No. ROC920010171US1 44 



1 51. The program product of claim 49 wherein the computer-readable signal bearing 

2 media comprises transmission media. 



1 52. The program product of claim 49 wherein the profile-based loop optimizer peels a 

2 selected loop if one of the following conditions are true: 

3 (A) the execution frequency table for the selected loop has a dominant mode that 

4 is less than a specified peeling threshold; 

5 (B) the execution frequency table for the selected loop does not have a dominant 

6 mode, and most of the execution frequencies in the execution frequency table are smaller 

7 than the specified peeling threshold. 

1 53. The program product of claim 52 wherein the profile-based loop optimizer unrolls 

2 the selected loop if: 

3 neither (A) nor (B) are true; and 

4 most of the execution frequencies in the execution frequency table for the selected 

5 loop are greater than a specified unrolling threshold. 
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